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Abstract. Motivated by current challenges in data-intensive sensor networks, we formulate 
a coverage optimization problem for mobile visual sensors as a (constrained) repeated multi-player 
game. Each visual sensor tries to optimize its own coverage while minimizing the processing cost. We 
present two distributed learning algorithms where each sensor only remembers its own utility values 
and actions played during the last plays. These algorithms arc proven to be convergent in probability 
to the set of (constrained) Nash equilibria and global optima of certain coverage performance metric, 
respectively. 

1. Introduction. There is a widespread belief that continuous and pervasive 
monitoring will be possible in the near future with large numbers of networked, mo- 
bile, and wireless sensors. Thus, we are witnessing an intense research activity that 
focuses on the design of efficient control mechanisms for these systems. In particu- 
lar, decentralized algorithms would allow sensor networks to react autonomously to 
changes in the environment with minimal human supervision. 

A substantial body of research on sensor networks has concentrated on simple 
sensors that can collect scalar data; e.g., temperature, humidity or pressure data. 
Here, a main objective is the design of algorithms that can lead to optimal collective 
sensing through efficient motion control and communication schemes. However, scalar 
measurements can be insufficient in many situations; e.g., in automated surveillance 
or traffic monitoring applications. In contrast, data-intensive sensors such as cameras 
can collect visual data that are rich in information, thus having tremendous potential 
for monitoring applications, but at the cost of a higher processing overhead. 

Precisely, this paper aims to solve a coverage optimization problem taking into 
account part of the sensing/processing trade-off. Coverage optimization problems 
have mainly been formulated as cooperative problems where each sensor benefits 
from sensing the environment as a member of a group. However, sensing may also 
require expenditure; e.g., the energy consumed or the time spent by image processing 
algorithms in visual networks. Because of this, we endow each sensor with a utility 
function that quantifies this trade-off, formulating a coverage problem as a variation 
of congestion games in [26] . 

Literature review. In broad terms, the problem studied here is related to a bevy 
of sensor location and planning problems in the Computational Geometry, Geometric 
Optimization, and Robotics literature. For example, different variations on the (com- 
binatorial) Art Gallery problem include [25 28 [30 . The objective here is how to find 
the optimum number of guards in a non-convex environment so that each point is vis- 
ible from at least one guard. A related set of references for the deployment of mobile 
robots with omnidirectional cameras includes [TU] |TT| . Unlike the Art Gallery classic 
algorithms, the latter papers assume that robots have local knowledge of the environ- 
ment and no recollection of the past. Other related references on robot deployment 
in convex environments include [6j|16j for anisotropic and circular footprints. 

The paper [1] is an excellent survey on multimedia sensor networks where the state 
of the art in algorithms, protocols, and hardware is surveyed, and open research issues 
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are discussed in detail. As observed in [7J, multimedia sensor networks enhance tra- 
ditional surveillance systems by enlarging, enhancing, and enabling multi-resolution 
views. The investigation of coverage problems for static visual sensor networks is 
conducted in [5] [13] . 

Another set of relevant references to this paper comprise those on the use of game- 
theoretic tools to (i) solve static target assignment problems, and (ii) devise efficient 
and secure algorithms for communication networks. In [18] . the authors present a 
game-theoretic analysis of a coverage optimization problem for static sensor networks. 
This problem is equivalent to the weapon-target assignment problem in [24] which is 
nondeterministic polynomial-time complete. In general, the solution to assignment 
problems is hard from a combinatorial optimization viewpoint. 

Game Theory and Learning in Games are used to analyze a variety of fundamental 
problems in; e.g., wireless communication networks and the Internet. An incomplete 
list of references includes [5] on power control, [27] on routing, and [H] on flow con- 
trol. However, there has been limited research on how to employ Learning in Games 
to develop distributed algorithms for mobile sensor networks. One exception is the 
paper [17] where the authors establish a link between cooperative control problems 
(in particular, consensus problems), and games (in particular, potential games and 
weakly acyclic games). 

Statement of contributions. The contributions of this paper pertain to both cover- 
age optimization problems and Learning in Games. Compared with 15 and [16] . this 
paper employs a more accurate sensing model and the results can be easily extended 
to include non-convex environments. Contrary to [15], we do not consider energy 
expenditure from sensor motions. 

Regarding Learning in Games, we extend the use of the payoff-based learning dy- 
namics in }19| |20) . The coverage game we consider here is shown to be a (constrained) 
potential game. A number of learning rules; e.g., better (or best) reply dynamics and 
adaptive play, have been proposed to reach Nash equilibria in potential games. In 
these algorithms, each player must have access to the utility values induced by alterna- 
tive actions. In our problem set-up; however, this information is unaccessible because 
of the information constraints caused by unknown rewards, motion and sensing lim- 
itations. To tackle this challenge, we develop two distributed payoff-based learning 
algorithms where each sensor only remembers its own utility values and actions played 
during the last plays. 

In the first algorithm, at each time step, each sensor repeatedly updates its action 
synchronously, either trying some new action or selecting the action which corresponds 
to a higher utility value in the most recent two time steps. The first advantage of this 
algorithm over the payoff-based learning algorithms of [19] [20] is its simpler dynamics, 
which reduces the computational complexity. Furthermore, the algorithm employs a 
diminishing exploration rate (in contrast to the constant one in [19 20] ) . The dy- 
namically changing exploration rate renders the algorithm an inhomogeneous Markov 
chain (instead of the homogeneous ones in fTS] J^U] ) . This technical change allows us 
to prove convergence in probability to the set of (constrained) Nash equilibria from 
which no agent is willing to unilaterally deviate. Thus, the property is substantially 
stronger than those in [19] [20] where the algorithms are guaranteed to converge to 
Nash equilibria with a sufficiently large probability by choosing a sufficiently small 
exploration rate in advance. 

The second algorithm is asynchronous. At each time step, only one sensor is 
active and updates its state by either trying some new action or selecting an action 
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according to a Gibbs-like distribution from those played in last two time steps when 
it was active. The algorithm is shown to be convergent in probability to the set of 
global maxima of a coverage performance metric. Compared with the synchronous 
payoff-based log-linear learning algorithm in |19) , this algorithm is asynchronous and 
simpler. Furthermore, rather than maximizing the associated potential function, the 
second algorithm optimizes a different global function which captures better a global 
trade-off between the overall network benefit from sensing and the total energy the 
network consumes. Again, by employing a diminishing exploration rate, our algorithm 
is guaranteed to have stronger convergence properties that the ones in |19j . 

2. Problem formulation. Here, we first review some basic game-theoretic con- 
cepts; see, for example [5]. This will allow us to formulate subsequently an optimal 
coverage problem for mobile visual sensor networks as a repeated multi-player game. 
We then introduce notation used throughout the paper. 

2.1. Background in Game Theory. A strategic game V := (V, A, U) has three 
components: 

1. A set V enumerating players i € V := {1, • • • , N}. 

2. An action set A :— YiiLi A is the space of all actions vectors, where Si S Ai 
is the action of player i and an (multi-player) action s 6 A has components 
si, . . . , s N . 

3. The collection of utility functions U, where the utility function m : A — > M. 
models player z's preferences over action profiles. 

Denote by s_; the action profile of all players other than i, and by A-i — Yij^i A? 
the set of action profiles for all players except i. The concept of (pure) Nash equilib- 
rium (NE, for short) is the most important one in Non-cooperative Game Theory [9] 
and is defined as follows. 

Definition 2.1 (Nash equilibrium 9 ). Consider the strategic game T. An 
action profile s* :— (s*, s^,) is a (pure) NE of the game T if\/i^V and Vs; € Ai, it 
holds that Ui(s*) > Ui{si, slj. 

An action profile corresponding to an NE represents a scenario where no player 
has incentive to unilaterally deviate. Potential Games form an important class of 
strategic games where the change in a player's utility caused by a unilateral deviation 
can be measured by a potential function. 

Definition 2.2 (Potential game [23] ) . The strategic game T is a potential game 
with potential function (f) : A — > K if for every i £ V, for every S-i € A_i, and for 
every Sj,s^ € Ai, it holds that 

ct>(si, s_i) - (f>(s'i, s_i) = Ui(si,s-i) - Ui(s'i, s-i). (2.1) 

In conventional Non-cooperative Game Theory, all the actions in Ai always can 
be selected by player i in response to other players' actions. However, in the context 
of motion coordination, the actions available to player i will often be constrained to 
a state-dependent subset of Ai. In particular, we denote by Fi(si,S-i) C Ai the set 
of feasible actions of player i when the action profile is s :— (si, s-i). We assume that 
Fi(si,s^i) ^ 0. Denote F(s) := U l& v F i( s ) C A, Vs G A and F := U{F(s) | s € A}. 
The introduction of F leads naturally to the notion of constrained strategic game 
Tics := {V, A, U,F), and the following associated concepts. 

Definition 2.3 (Constrained Nash equilibrium) . Consider the constrained strate- 
gic game T TCS . An action profile s* is a constrained (pure) NE of the game r rcs if 
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Vi G V andWsi G Fi(s*, s*_^), it holds that Uj(s*) > «j(sj, s^). 

Definition 2.4 (Constrained potential game). The game T les is a constrained 
potential game with potential function <p(s) if for every i G V, every s_i G A-i, and 
every Si G Ai, the equality (|2.ip holds for every s' ; G Fi(si, S-i). 

Observe that if s* is an NE of the strategic game T, then it is also a constrained 
NE of the constrained strategic game r res . For any given strategic game, NE may 
not exist. However, the existence of NE in potential games is guaranteed [23] . Hence, 
any constrained potential game has at least one constrained NE. 

2.2. Coverage problem formulation. 

2.2.1. Mission space. We consider a convex 2-D mission space that is dis- 
cretized into a (squared) lattice. We assume that each square of the lattice has 
unit dimensions. Each square will be labeled with the coordinate of its center q = 
(q x ,q y ), where q x G [q Xnlin , <7x max ] and q v G [q Vmln , ?2, ma J, for some integers q Xmia , q Vmin , 
9a; max ! % max ■ Denote by Q the collection of all squares of the lattice. 

We now define an associated location graph Q\ oc :— {Q,E\ oc ) where ((q x ,q y ), 
(<lx',qy')) G E loc if and only if \q x - q x ,\ + \q y - q y ,\ = 1 for (q x ,q y ), (q x ',q v >) G Q. 
Note that the graph Q\ oc is undirected; i.e., {q,q') G E\ oc if and only if {q' ,q) G E\ oc . 
The set of neighbors of q in E loc is given by Af q oc := {<?' G Q\{q} \ (q, q') G E ioc }. We 
assume that the location graph Q\ oc is fixed and connected, and denote its diameter 
hjD. 

Agents are deployed in Q to detect certain events of interest. As agents move 
in Q and process measurements, they will assign a numerical value W q > to the 
events in each square with center q G Q. If W q — 0, then there is no significant event 
at the square with center q. The larger the value of W q is, the more interest the set 
of events at the square with center q is of. Later, the amount W q will be identified 
with a benefit of observing the point q. In this set-up, we assume the values W q to 
be constant in time. Furthermore, W q is not a prior knowledge to the agents, but the 
agents can measure this value through sensing the point q. 

2.2.2. Modeling of the visual sensor nodes. Each mobile agent i is modeled 
as a point mass in Q, with location a; := (xi,yi) G Q. Each agent has mounted a 
pan-tilt-zoom camera, and can adjust its orientation and focal length. 

The visual sensing range of a camera is directional, limited-range, and has a finite 
angle of view. Following a geometric simplification, we model the visual sensing region 
of agent i as an annulus sector in the 2-D plane; see Figure [2~T1 

The visual sensor footprint is completely characterized by the following parame- 
ters: the position of agent i, ai G Q, the camera orientation, 9i G [0, 27r), the camera 
angle of view, a; G [a m i n ,a ma x], and the shortest range (resp. longest range) be- 
tween agent i and the nearest (resp. farthest) object that can be recognized from 
the image, rf hrt G [r min ,r max ] (resp. r - ng G [r min , r max ]). The parameters rf hrt , r' ng , 
ai can be tuned by changing the focal length FL^ of agent i's camera. In this way, 
c; :— (FLj, 0i) G [0, FL max ] x [0, 2ir) is the camera control vector of agent i. In what 
follows, we will assume that c.j takes values in a finite subset C C [0, FL max ] x [0, 2-7r). 
An agent action is thus a vector Sj := (a^, c.;) G Ai := QxC, and a multi-agent action 
is denoted by s = (si, . . . , s^) G A := Tif =1 Ai. 

Let V(ai,Ci) be the visual sensor footprint of agent i. Now we can define a 
proximity sensing grapffj] G S cn(s) '■= {V, E scn (s)) as follows: the set of neighbors of 



See [4] for a definition of proximity graph. 
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FlG. 2.1. Visual sensor footprint and a configuration of the mobile sensor network 



agent i, A/f n (s), is given as A/| en (s) := {j G V\{i} \ V(a uCl ) n V{a 3 ,c 3 ) n Q ^ 0}. 

Each agent is able to communicate with others to exchange information. We 
assume that the communication range of agents is 2r max . This induces a 2r max -disk 
communication graph C? C omm(s) := (V, E comm (s)) as follows: the set of neighbors of 
agent i is given by A^ comm (s) := {j G V\{i} \ (a* - x 3 ) 2 + (y % - Vi f < (2r max ) 2 }. 
Note that <? C omm(s) is undirected and that Gscn(s) C £/ coro m(s). 

The motion of agents will be limited to a neighboring point in Q\ oc at each time 
step. Thus, an agent feasible action set will be given by Ti(a,i) := ({ek} U A/2° c ) X C. 

2.2.3. Coverage game. We now proceed to formulate a coverage optimization 
problem as a constrained strategic game. For each q G Q, we denote n q (s) as the 
cardinality of the set {k G V | q G T>(a,k,Ck) PI Q}; i.e., the number of agents which 
can observe the point q. The "profit" given by W q will be equally shared by agents 
that can observe the point q. The benefit that agent i obtains through sensing is thus 
defined by E qe v(a z ,c z )n Q ^y- 

On the other hand, and as argued in |21j , the processing of visual data can incur 
a higher cost than that of communication. This is in contrast with scalar sensor 
networks, where the communication cost dominates. With this observation, we model 
the energy consumption of agent i by fi(ci) := \oii{(r 1 ^) 2 — ( r f hlt ) 2 )- This measure 
corresponds to the area of the visual sensor footprint and can serve to approximate 
the energy consumption or the cost incurred by image processing algorithms. 

We will endow each agent with a utility function that aims to capture the above 
sensing/processing trade-off. In this way, we define a utility function for agent i by 



Ui{s) 



E 



gef(oj,cj)nQ 



W q 
n q (s) 



fi(Ci). 



Note that the utility function Ui is local over the visual sensing graph 5scn(s); i.e., it, 
is only dependent on the actions of {i} U Af? cn (s). With the set of utility functions 
C^cov = {ui}i£v, and feasible action set T^ov = ^iLi Ua eA- ^ r i( a 0j we now have all 
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the ingredients to introduce the coverage game r cov := (V,A, U cov ,J- cov ). This game 
is a variation of the congestion games introduced in |26] . 

LEMMA 2.5. The coverage game T cov is a constrained potential game with poten- 
tial function 

n q (s) n 
q£Q (=1 i=l 



Proof. The proof is a slight variation of that in [26] ■ Consider any s :— (sj, s_j) G 
A where Si := (ai,Ci). We fix i G V and pick any = (a^,cj) from JFi(o{). Denote 
s' := fii := (£>(<*;, c,) \I>(aJ, cQ) PI Q and fi 2 := (2?«, cQ^K, c,)) H Q. 

Observe that 

<f>(Si,S-i) - 0(s^,s_j) 

"ECy- E E(" Ey+E ^) -/.(«.) + /.(«« 



^— ' n (s) rials') 



— S — i) Uii^S^ <S_^) 

where in the second equality we utilize the fact that for each q G f2i, %(s) = n 9 (s')+l, 
and each q G fisi n q(s') = n q (s) + 1. □ 

We denote by £(T cov ) the set of constrained NEs of r cov . It is worth mentioning 
that £ (r cov ) 7^ due to the fact that r cov is a constrained potential game. 

Remark 2.1. The assumptions of our problem formulation admit several ex- 
tensions. For example, it is straightforward to extend our results to non- convex 3-D 
spaces. This is because the results that follow can also handle other shapes of the sen- 
sor footprint; e.g., a complete disk, a subset of the annulus sector. On the other hand, 
note that the coverage problem can be interpreted as a target assignment problem — 
here, the value W q > would be associated with the value of a target located at the 
point q. • 

2.3. Notations. In the following, we will use the Landau symbol, O, as in 
0(e fc ), for some k > 0. This implies that < lim e _ >0 + ^ < +oo. We denote 
by diagVl := {{s, s) G A 2 \ s G .4} and diagf (L cov ) := {(s, s) G A 2 \ s G £(L cov )}. 

Consider a, a' G Q N where a, ^ a\ and a_i = a'_ i for some i G V. The transition 
a — > a' is feasible if and only if (ai, a'j) G E\ oc . A feasible path from a to a' consisting 
of multiple feasible transitions is denoted by a => a'. Let oa := {a' G Q \ a => a'} be 
the reachable set from a. 

Let s = (a, c),s' = (a',d) G A where a, ^ a\ and a_j = a'_ t for some i G V. 
The transition s — > s' is feasible if and only if s^ G J-i(a). A feasible path from 
s to s' consisting of multiple feasible transitions is denoted by s => s'. Finally, 
os := {s' G A | s => s'} will be the reachable set from s. 

3. Distributed coverage learning algorithms and convergence results. 

In our coverage problem, we assume that W q is unknown in advance. Furthermore, 
due to the limitations of motion and sensing, each agent is unable to obtain the 
information of W q if the point q is outside its sensing range. These information 
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constraints renders that each agent is unable to access to the utility values induced by 
alternative actions. Thus the action-based learning algorithms; e.g., better (or best) 
reply learning algorithm and adaptive play learning algorithm can not be employed 
to solve our coverage games. It motivates us to design distributed learning algorithms 
which only require the payoff received. 

In this section, we come up with two distributed payoff-based learning algorithms, 
say Distributed Inhomogeneous Synchronous Coverage Learning Algorithm (DISCL, 
for short) and Distributed Inhomogeneous Asynchronous Coverage Learning Algorithm 
(DIACL, for short). We then present their convergence properties. Relevant algo- 
rithms include payoff-based learning algorithms proposed in [19 [20 . 

3.1. Distributed Inhomogeneous Synchronous Coverage Learning Al- 
gorithm. For each t > 1 and i G V, we define Tj(i) as follows: T.i(t) = t if 
Ui(s(t)) > Ui(s(t — 1)), otherwise, Ti(t) = t — 1. Here, Si(Ti(tj) is the more suc- 
cessful action of agent i in last two steps. The main steps of the DISCL algorithm 
are the following: 

1: [ Initialization :] At t = 0, all agents are uniformly placed in Q. Each agent i 
uniformly chooses its camera control vector Ci from the set C, communicates with 
agents in A^ sen (s(0)), and computes Ui(s(0j). At t = 1, all the agents keep their 
actions. 

2: [Update:] At each time t > 2, each agent i updates its state according to the 
following rules: 

i 

• Agent i chooses the exploration rate e(t) — t N ( D + 1 ' and compute Si(ri(t)). 

• With probability e(t), agent i experiments, and chooses the temporary 
action s* p := (et' p ,c* p ) uniformly from the set J-i(ai(t)) \ {si(ri(t))}. 

• With probability 1 — e(t), agent i does not experiment, and sets s' p = 
Si(n(t)). 

• After s' p is chosen, agent i moves to the position a* p and sets the camera 
control vector to c' p . 

3: [Communication and computation:] At position a' p , agent i communicates with 

agents in A<7 en (s- P , sj^), and computes «i(s* p , s^) and ^(a*^). 
4: Repeat Step 2 and 3. 

Remark 3.1. A variation of the DISCL algorithm corresponds to e{t) = e G (0, ^] 
constant for all t > 2. If this is the case, we will refer to the algorithm as Distributed 
Homogeneous Synchronous Coverage Learning Algorithm (DHSCL, for short). Later, 
the convergence analysis of the DISCL algorithm will be based on the analysis of the 
DHSCL algorithm. • 

Denote the space B :— {{s,s') E A x A | s- G IFi(ai), Vi G V}. Observe that 
z{t) := (s(t — l),s(i)) in the DISCL algorithm constitutes a time-inhomogeneous 
Markov chain {Pt] on the space B. The following theorem states that the DISCL 
algorithm asymptotically converges to the set of £(T cov ) in probability. 

Theorem 3.1. Consider the Markov chain {Vt} induced by the DISCL Algo- 
rithm. It holds that linit^+oo ¥(z(t) G diag£ (T cov )) = 1. 

The proofs of Theorem 13.11 are provided in Section [4] 

Remark 3.2. The DISCL algorithm is simpler than the payoff-based learning 
algorithm proposed in '201, reducing the computational complexity. The algorithm 
studied in i20f converges the set of NEs with a arbitrarily high probability by choosing 
a arbitrarily small exploration rate e in advance. However, it is difficult to derive 
the relation between the convergent probability and the exploration rate. It motivates 
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us to utilize a diminishing exploration rate in the DISCL algorithm which induces a 
time-inhomogeneous Markov chain in contrast to a time-inhomogeneous Markov chain 
in )2(tf . This change renders a stronger convergence property, i.e., the convergence to 
the set of NEs in probability. • 

3.2. Distributed Inhomogeneous Asynchronous Coverage Learning Al- 
gorithm. Lemma 12.51 shows that the coverage game r cov is a constrained poten- 
tial game with potential function <fi(s). However, this potential function is not a 
straightforward measure of the network coverage performance. On the other hand, 
the objective function U g (s) := J2i^v u i( s ) captures the trade-off between the over- 
all network benefit from sensing and the total energy the network consumes, and 
thus can be perceived as a more natural coverage performance metric. Denote by 
S* := {s | argmax se ^{/ 9 (s)} as the set of global maximizers of U g (s). In this part, 
we present the DIACL algorithm which is convergent in probability to the set S* . 

Before that , we first introduce some notations for the DIACL algorithm. Denote 
by B' the space £>' := {(s, s') G A x A | s_, = s'_^ s^ G J 7 l (a l ) for some i G V}. For 
any s°, s 1 G A with s°_j = s 1 _ i for some i G V, we denote 

l{ ' ' ' 2 ^ rials 1 ) 2^ n f s oy 

where fii := D(aJ,cJ)\I>(a?,c£) n Q and ft 2 := 2?(aJ,cJ)\25(a},cJ) n Q, and 
s 1 ) := ^(s 1 ) - A 2 ( S \ a ) - + A^s , s 1 ), 

* 2 ( S °, s 1 ) := max{ Wj ( S °) - A.^ , A ^(s 1 ) - A^s 1 , s )}, 

m*:= max {^ t (s° , s 1 ) ~ (u t (s°) ~ \(s° , s 1 )),^}. 

It is easy to check that A i (s 1 ,s°) = -A i (s°,s 1 ) and ^(s^s 1 ) = ^(s 1 , Assume 
that at each time instant, one of agents becomes active with equal probability. Denote 
by Jilt) the last time instant before t when agent i was active. We then denote 

(2) 

h If) '■— 7i(7»w)' The main steps of the DIACL algorithm are described in the 
following. 

l: [ Initialization :] At t = 0, all agents are uniformly placed in Q. Each agent i 
uniformly chooses the camera control vector Cj from the set C, and then com- 
municates with agents in A/7 cn (s(0)) and computes Ui(s(Q)). Furthermore, each 
agent i chooses m t G (2m* , Km*] for some K > 2. At t = 1, all the sensors keep 
their actions. 

2: [Update:] Assume that agent i is active at time t > 2. Then agent i updates its 
state according to the following rules: 

i 

• Agent i chooses the exploration rate elf) = t (D+i)(-fc+i)m» , 

• With probability e(t) mi , agent i experiments and uniformly chooses s* p := 

(a* p ,c* p ) from the action set Ji («!»(*)) \ {sj(i), Si(7- 2) (£) + 1)}. 

• With probability 1 — e(t) mi , agent z does not experiment and chooses s* p ac- 
cording to the following probability distribution: 

P( s f = Si (t)) = ^ , 

P(-? = -i(7f°(*) + 1)) = 



c(t)P«(*«(7 4 (a) (*)+ 1 ).»*(*))' 
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• After s* p is chosen, agent i moves to the position a' p and sets its camera control 
vector to be c* p . 

3: [Communication and computation:] At position a' p , the active agent i communi- 
cates with agents in A/J en (s- P , s_j(i)), and computes Mi(s- P , s_j(t)), Ai((s- P , s_j(i)), s(7j(i)- 

l)),^(«f)- 
4: Repeat Step 2 and 3. 

Remark 3.3. A variation of the DIACL algorithm corresponds to e(i) = e G 
(0, ^] constant for all t > 2. // iftis is £/ie case, we re/er <o i/ie algorithm as the 
Distributed Homogeneous Asynchronous Coverage Learning Algorithm (TJHACL, for 
short). Later, we will base the convergence analysis of the DIACL algorithm on that 
of the DHACL algorithm. • 

Like the DISCL algorithm, z(t) :— (s(t — l),s(t)) in the DIACL algorithm con- 
stitutes a time-inhomogeneous Markov chain {Vt} on the space B 1 . The following 
theorem states that the convergence property of the DIACL algorithm. 

Theorem 3.2. Consider the Markov chain {Pt} induced by the DIACL algorithm 
for the game L cov . Then it holds that lim t ^ +oc P(z(t) G diagS*) = 1. 

The proofs of Theorem 13.21 are provided in Section |U 

Remark 3.4. The authors in fWjj proposed a synchronous payoff-based log-linear 
learning algorithm. This algorithm is able to maximize the potential function of a po- 
tential game. While the DIACL algorithm is a variation of that in \1 9f . and optimizes 
a different function U g (s). Furthermore, the convergence of the DIACL algorithm is 
in probability and stronger than the arbitrarily high probability \19§ by choosing an 
arbitrarily small exploration rate in advance. • 

4. Convergence Analysis. In this section, we prove Theorem 13.11 and 13.21 by 

appealing to the Theory of Resistance Trees in [31] and the results in strong ergodicity 
in [14] . Relevant papers include [19] [20] where the Theory of Resistance Trees in [31] 
is first utilized to study the class of payoff-based learning algorithms, and [12] [3 [22 
where the strong ergodicity theory is employed to characterize the convergence prop- 
erties of time-inhomogeneous Markov chains. 

4.1. Convergence analysis of the DISCL Algorithm. We first utilize Theo- 
rem !6.6l to characterize the convergence properties of the associated DHSCL algorithm. 
This is essential for the analysis of the DISCL algorithm. 

Observe that z(t) :— (s(t — 1), s(t)) in the DHSCL algorithm constitutes a time- 
homogeneous Markov chain {V^} on the space B. Consider z, z' G B. A feasible path 
from z to z' consisting of multiple feasible transitions of \Vf\ is denoted by z => z'. 
The reachable set from z is denoted as oz :— {z' € B \ z z'}. 

Lemma 4.1. {Vf} is a regular perturbation of {V®}. 

Proof. Consider a feasible transition z 1 — > z 2 with z 1 :— (s°, s 1 ) and z 2 :— (s 1 , s 2 ). 
Then we can define a partition of V as Ai := {i G V \ sj = sT**- ' 1 ^} and A2 := {i G 
V I sf G J-i(a\) \ {s^ 0,1 ^}}. The corresponding probability is given by 

Hence, the resistance of the transition z 1 — > z 2 is | A.2 1 G {0, 1, • • • , N} since 
< lim -f^f = TT — \ < +00. 
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We have that (A3) in Section 15721 holds. It is not difficult to see that (A2) holds, 
and we are now in a position to verify (Al). Since Q\ oc is undirected and connected, 
and multiple sensors can stay in the same position, then oa° = Q N for any a G Q. 
Since sensor i can choose any camera control vector from C at each time, then os° = A 
for any s° G A. It implies that oz° — B for any z° G £>, and thus the Markov chain 
{Vf} is irreducible on the space B. 

It is easy to see that any state in diag A has period 1 . Pick any (s°, s 1 ) E B\diag*4. 
Since Q\ oc is undirected, then s° e if and only if s\ G ^(a?). Hence, the 

following two paths are both feasible: 

(«V)->(*\*°)->(«V) 

Hence, the period of the state (s^s 1 ) is 1. This proves aperiodicity of \Vf\- Since 
{VI} is irreducible and aperiodic, then (Al) holds. □ 

Lemma 4.2. For any (s ,s ) G diag *4\ diag £(T cov ), there is a finite sequence of 
transitions from (s°,s°) to some (s*,s*) G diag£(r cov ) that satisfies 

C := ( S )S 0) °$ (s jS l } OJl) (s l )S l } Oie) (fil)s2) 

where (s k , s fe ) = (s*, s*) /or some fc > 1. 

Proof. If s° ^ £ (r cov ), there exists a sensor i with a action sj G ^i{a°i) such that 
u^s 1 ) > Ui(s°) where s ^ = s]^. The transition (s°,s°) — >• (s^s 1 ) happens when 
only sensor i experiments, and its corresponding probability is (1 — e)^ -1 x .j. , o,i_ 1 . 
Since the function <j> is the potential function of the game r cov , then we have that 

0( s l) _ 0( S O) _ U( ( s l) _ Mi ( s 0) and thus ^1) > ^ s oy 

Since u^s 1 ) > Uj(s°) and s ^ = s^, the transition (s^s 1 ) — >• (s 1 ^ 1 ) occurs 
when all sensors do not experiment, and the associated probability is (1 — e) N . 

We repeat the above process and construct the path C with length k > 1. Since 
4>(s l ) > ^(s 1 " 1 ) for i = {1, . . . , k}, then s l ^ s- 7 for i ^ j and thus the path C has no 
loop. Since A is finite, then k is finite and thus s k = s* G £(r cov ). □ 

A direct result of Lemma l4.1l is that for each e, there exists a unique stationary 
distribution of {Vf}, say /i(e). We now proceed to utilize Theorem 16.61 to characterize 
lim £ ^ 0+ /ti(e). 

Proposition 4.3. Consider the regular perturbation {Vf} of TP?} ■ Then Urn u(e) 

exists and the limiting distribution /x(0) is a stationary distribution of {V®} . Further- 
more, the stochastically stable states (i.e., the support of /i(0)) are contained in the 
set diag£(r cov ). 

Proof. Notice that the stochastically stable states are contained in the recur- 
rent communication classes of the unperturbed Markov chain that corresponds to the 
DHSCL Algorithm with e = 0. Thus the stochastically stable states are included in 
the set diag„4 c B. Denote by T m i n the minimum resistance tree and by h v the root 
of T m in- Each edge of T m i n has resistance 0,1,2,... corresponding to the transition 

probability 0(1), O(e), 0(e 2 ), The state z' is the successor of the state z if and 

only if (z,z') G T m ; n . Like Theorem 3.2 in [50], our analysis will be slightly different 
from the presentation in 16.21 We will construct T m ; n over states in the set B (rather 
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than diag„4) with the restriction that all the edges leaving the states in £>\diag„4 have 
resistance 0. The stochastically stable states are not changed under this difference. 

Claim 1. For any (s ,,? 1 ) G £>\diag„4, there is a finite path 

a ■.= (s\s x ) °^ {s \sY^ (*V) 

where sf = sj ^ for all i G V . 

Proof. These two transitions occur when all agents do not experiment. The 
corresponding probability of each transition is (1 — e) N . □ 

Claim 2. The root h v belongs to the set diag .4. 

Proof. Suppose that h v = (s^s 1 ) G Z?\diag„4. By Claim 1, there is a finite path 

£' := (s°,s 1 ) °W (s^s 2 ) °W (s 2 ,s 2 ). We now construct a new tree T 1 by adding 
the edges of the path £ into the tree T m j n and removing the redundant edges. The 
total resistance of adding edges is 0. Observe that the resistance of the removed edge 
exiting from (s 2 , s 2 ) in the tree T m j n is at least 1. Hence, the resistance of T" is strictly 
lower than that of T m i n , and we get to a contradiction.Q 

Claim 3. Pick any s* G £(T cov ) and consider z := (s*,s*), z' := (s*,s) where 
s 7^ s* . If (z, z') G T m ; n , then the resistance of the edge (z, z') is some k > 2. 

Proof. Suppose the deviator in the transition z — > z' is unique, say i. Then the 
corresponding transition probability is O(e). Since s* G E (T cov ) and Si G Ti(a*), we 
have that w^s^sl^) > Ui(Si,S—i), where sL,j — s_i. 

Since z' G B \ diag A, it follows from Claim 2 that the state z' can not be the 
root of T m in and thus has a successor z" . Note that all the edges leaving the states 
in B \ diag„4 have resistance 0. Then none experiments in the transition z' — > z" and 
z" = (s, s) for some s. Since Ui(s*, sl i ) > Ui(§i, s_.;) with sl i = s^i, we have s = s* 
and thus z" = (s,s*). Similarly, the state z" must have a successor z'" and z'" — z. 
We then obtain a loop in T m i n which contradicts that T m ; n is a tree. 

It implies that at least two sensors experiment in the transition z — > z' . Thus the 
resistance of the edge (z, z') is at least 2.D 

Claim 4. The root h v belongs to the set diag £(T cov ). 

Proof. Suppose that h v = (s°, s°) ^ diagf (r cov ). By Lemma 1431 there is a finite 
path £ connecting (s°, s°) and some (s*,s*) G diag£(r cov ). We now construct a new 
tree T' by adding the edges of the path C into the tree T m i n and removing the edges 
that leave the states in C in the tree T m [ n . The total resistance of adding edges is k. 
Observe that the resistance of the removed edge exiting from (s l , s l ) in the tree T m ; n 
is at least 1 for i G {1, • • • , k — 1}. By Claim 3, the resistance of the removed edge 
leaving from (s*,s*) in the tree T m - ln is at least 2. The total resistance of removing 
edges is at least k + 1. Hence, the resistance of T" is strictly lower than that of T m [ n , 
and we get to a contradiction.Q 

It follows from Claim 4 that the states in diag£ (r cov ) have minimum stochastic 
potential. Since Lemma [4.11 shows that Markov chain {V%} is a regularly perturbed 
Markov process, Proposition 14.31 is a direct result of Theorem 16.61 □ 

We are now ready to show Theorem 13. II 

Proof of Theorem l3?fl 

Claim 5. Condition (B2) in Theorem \6.5\ holds. 
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Proof. For each t > and each z £ X , we defines the numbers 
o-,(e(t)):= £ J] i*W ^=<7 2 (e(i)) 

T<EG(z) (x,s/)eT 

:= r ' (e(t f = /x s (e(t)). 

Since {T^} is a regular perturbation of {V®}, then it is irreducible and thus 
a\ > 0. As Lemma 3.1 of Chapter 6 in jS], one can show that (p i ) T P e w — (n t ) T ■ 
Therefore, condition (B2) in Theorem 16.51 holds.D 

Claim 6. Condition (B3) in Theorem 1 6. 51 holds. 

Proof. We now proceed to verify condition (B3) in Theorem 16.51 To do that, let 
us first fix t, denote e = e(t) and study the monotonicity of /x 2 (e) with respect to e. 
We write 172(e) in the following form 

-.w= e n ^= e n ri=re ^ 



for some polynomials a z (e) and /3 2 (e) in e. With (|4.2p in hand, we have that Y2xeX ^ ( e ) 

/3(e) 



and thus /it z (e) are ratios of two polynomials in e; i.e., /ti z (e) = where ^ z (e) and 



/3(e) are polynomials in e. The derivative of fJ. z (e) is given by 
dj^M 1 « flM M ^(e) 

Note that the numerator dV g^ (3(e) — ip z (e) 8 ^f 1 is a polynomial in e. Denote by 
i 2 ^ the coefficient of the leading term of dlf g^ — yz(e) . The leading term 
dominates — <lz( e ) when e is sufficiently small. Thus there exists e 2 > 

such that the sign of d ^Q ( f^ is the sign of i z for all < e < e 2 . Let e* = max 2e x e 2 . 

Since e(t) strictly decreases to zero, then there is a unique finite time instant t* 
such that e(t*) = e* (if e(0) < e*, then t* = 0). Since e(t) is strictly decreasing, we 
can define a partition of X as follows: 

Hi := {z e X I /*,(e(t)) > /i,(e(t + 1)), Vf G [f, +00)}, 
S 2 :={zel I /i,(c(t)) < /i«(c(t + 1)), Vt e [f, +00)}. 

We are now ready to verify (B3) of Theorem l6.5l Since {VI} is a regular perturbed 
Markov chain of {V®}, it follows from Theorem 16.61 that lim t ^ +oc /i z (e(t)) = jU z (Q), 
and thus it holds that 



E E wti - = E E M e (*)) - vMt + 

i=Q £=0 

t* +00 

=EEW £ ( t ))-^ £ (* +!))!+ E (E^(^))-E^(^ +1 ))) 

+00 

+ E (i-E^^+^-^-E^w*)))) 

t=t* + l 2 gHi 2 GHi 

t* 

= E E i^( £ (*)) - ^( £ (* + + 2 E ^( £ (** + !)) - 2 E ^(°) < +°°- 



t=o 2 ex ze=i 2 eHi 
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□ 

Claim 7. Condition (Bl) in Theorem \6.5\ holds. 

Proof. Denote by P e ^> the transition matrix of {Vt}- As in (|4.ip . the probability 
of the feasible transition z 1 — > z 2 is given by 



Observe that ^(cij)! < 5|C|. Since e(i) is strictly decreasing, there is to > 1 such 

51? 



that to is the first time when 1 — e(t) > 5 |e | _i ■ Then for all t > to, it holds that 



P f \ > ( e ^ )*, 

z z ~ y 5\C\ -V 

Denote P(m,n) := n£T m -P e ^, < m < n. Pick any z € £> and let u z e B be 
such that P Uz z{t, t + D + 1) = min^gg P xz (t, t + D + 1). Consequently, it follows that 
for all t > to, 

mmp xz (t, t+D+i) = y- y pi {t } ■ ■ ■ p^+f-vp^ 

heB in£EB 

> P £ (*' . . . p e ( i + £> - 1 ) p<t+ D ) > TTf f(i+jj )N > I e ( f ) \(D+1)N 

— Ugil iD-llD IDZ — _ 1 > — V _ J' 

where in the last inequality we use e(t) begin strictly decreasing. Then we have 
l-X(P(t,t + D+l)) = min y rxiin{P xz (i, t + D + 1), P yz (t, t + D + 1)} 

zEB 

> y PuAt, t + D + i)> iBK-ji^jcw)*. 

zee ' 

Choose fcj := (D + l)i and let i be the smallest integer such that (D + l)i > t . 
Then, we have that: 



g(l - X(P( ki , k i+1 ))) > \B\ E( C( S| + _y ) )( ° +1)Af 

1= 2 — 2Q 

" (5ici — i)c^+i)at (£> + i)i - +OG - (4 - 3) 

v 1 1 y i—io 

Hence, the weak ergodicity property follows from Theorem I6.4I D 

All the conditions in Theorem l6.5l hold. Thus it follows from Theorem l6.5l that the 
limiting distribution is /x* = limt_ ) .+ 00 /i*. Note that linif_ ! .+ 00 p} = linif_ ! .+ 00 /x(e(t)) = 
/x(0) and Proposition 14.31 shows that the support of ^t(O) is contained in the set 
diagf (r cov )- Hence, the support of [i* is contained in the set diagf (r cov ), imply- 
ing that lim^+oo P(z(t) e diag£(r cov )) = 1. This completes the proof. 
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4.2. Convergence analysis of the DIACL Algorithm. First of all, we employ 
Theorem [nil] to study the convergence properties of the associated DHACL algorithm. 
This is essential to analyze the DIACL algorithm. 

To simplify notations, we will use Sj(i — 1) := Si(7 4 (t) + 1) in the remainder of 
this section. Observe that z(t) :— (s(t — 1), s{t)) in the DHACL algorithm constitutes 
a Markov chain {Vf} on the space £>'. 

Lemma 4.4. The Markov chain {V^} is a regular perturbation of {V®}. 

Proof. Pick any two states z 1 := (s^s 1 ) and z 2 :— (s^s 2 ) with z 1 ^ z 2 . We 
have that P*i z 2 > if and only if there is some i £ V such that = s 2 _ i and one 
of the following occurs: s 2 £ \ {s®,s}}, s 2 = sj or sf = s°. In particular, the 

following holds: 

U, S f e^(aj)\{s^sj}, 
[r)3, s 2 = s9, 

where 



(1 - e mi ) x e p ^°' sl ) 

Vl ' = N\^i^T\W^}\ ' ^ A(l + e^ ^))' V3 := N(l + eP^ ^) ' 

Observe that < lim e _j. + < +oo. Multiplying the numerator and denomina- 
tor of 772 by e **(s 1 ,s°)-(« i ( S 1 )-A i ( s 1 , 5 °)) ) we obtain 

1 _ e ™> 6 * 1 (s ,s 1 )-(t ll (s 1 )-A I (s 1 ,s )) 
??2 = ^ X £ ' 

where 7y 2 := e *^(s ■s 1 )Huds 1 )-A ^ (s\s )) + ^.(s"^ 1 )-^ )-^ ^ 1 )) . Use 




and we have 

e » 1 (.o, a i)- (u< (.i)-A 4 (.i,^)) ^ otherwise. 

Similarly, it holds that 

r % r J_ 

e ^™+ e * 4 (s<V)-(«i(<> )-Ai(«<V)) ^2iV' AT 

Hence, the resistance of the feasible transition z 1 — s> z 2 , with z 1 ^ z 2 and sensor i as 
the unilateral deviator, can be described as follows: 

-> = ^ s 1 ) - M* 1 ) - A,;( S \ S °)), s 2 = sj, 

[^yj-M* )-^^ ,* 1 )), s? = S ?. 

Then (A3) in Section 16.21 holds. It is straightforward to verify that (A2) in 
Section IBT21 holds. We are now in a position to verify (Al). Since Qi oc is undirected 
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and connected, and multiple sensors can stay in the same position, then oa° = Q 
for any a £ Q. Since sensor i can choose any camera control vector from C at each 
time, then os° = A for any s° £ A. This implies that oz° — B' for any z° £ £>', and 
thus the Markov chain \Vf\ is irreducible on the space B' . 

It is easy to see that any state in diag^4 has period 1. Pick any (s^s 1 ) £ 
B' \ diag A Since Q\ oc is undirected, then s? £ J-i{a\) if and only if s\ £ Ji(a°). 
Hence, the following two paths are both feasible: 

(* V)->(*V)->(«V) 

(s^s 1 ) (s 1 ,* 1 ) -y (s\s°) -> (As 1 )- 

Hence, the period of the state (s^s 1 ) is 1. This proves aperiodicity of {Vf}. Since 
{Pj } is irreducible and aperiodic, then (Al) holds. □ 

A direct result of Lemma l4.4l is that for each e > 0, there exists a unique stationary 
distribution of {V^}, say /i(e). From the proof of Lemma [4.41 we can see that the 
resistance of an experiment is mj if sensor i is the unilateral deviator. We now proceed 
to utilize Theorem 16.61 to characterize lim e _ >0 + /Lt(e). 

Proposition 4.5. Consider the regular perturbed Markov process {Vf}- Then 
lim £ _j.Q+ fi(e) exists and the limiting distribution /i(0) is a stationary distribution of 
{T'j }. Furthermore, the stochastically stable states (i.e., the support o//i(0)J are 
contained in the set diag S* . 

Proof. The unperturbed Markov chain corresponds to the DHACL Algorithm 
with e = 0. Hence, the recurrent communication classes of the unperturbed Markov 
chain are contained in the set diag A. We will construct resistance trees over vertices 
in the set diag A. Denote T m i n by the minimum resistance tree. The remainder of the 
proof is divided into the following four claims. 

Claim 8. x((s°,s°) => (s\ s 1 )) = m l + ^(s 1 , s°) - Ms 1 ) - A^s 1 , s )) where 
s ^ s 1 and the transition s° — > s 1 is feasible with sensor i as the unilateral deviator. 

Proof. One feasible path for (s°,s°) (s^s 1 ) is £ := (s°,s°) (s^s 1 ) 
(s^s 1 ) where sensor i experiments in the first transition and does not experiment 
in the second one. The total resistance of the path £ is mi + ^(s 1 , s°) — (u^s 1 ) — 
Aj(s 1 , s )) which is at most mi + m*. 

Denote by £' the path with minimum resistance among all the feasible paths for 
(s°,s°) ==> (s^s 1 ). Assume that the first transition in £' is (s°,s°) — > (s°,s 2 ) where 
node j experiments and s 2 ^ s . Observe that the resistance of (s , s ) — ?> (s°,s 2 ) 
is mj. No matter whether j is equal to i or not, the path £' must include at least 
one more experiment to introduce sj. Hence the total resistance of the path £' is at 
least mi + mj . Since m, + rrij > m, + 2m* , then the path £' has a strictly larger 
resistance than the path £. To avoid a contradiction, the path £' must start from 
the transition (s°,s°) — ► (s^s 1 ). Similarly, the sequent transition (which is also the 
last one) in the path £' must be (s^s 1 ) — > (s 1 ^ 1 ) and thus £' = £. Hence, the 
resistance of the transition (s°,s°) => (s-^s 1 ) is the total resistance of the path £; 
i.e., mi + y i (s\ s°) - Ms 1 ) - Ai(s\ s°))H 

Claim 9. ^4?/ £/ie edges ((s, s), (s', s')) m T m j n ?7iits£ consist of only one deviator; 
i.e., Si ^ s'i and s_^ = s'_ i for some i £ V. 

Proof. Assume that (s, s) => (s', s') has at least two deviators. Suppose the path 
£ has the minimum resistance among all the paths from (s, s) to (s', s'). Then, i > 2 
experiments are carried out along £. Denote iu by the unilateral deviator in the fc-th 
experiment s fc_1 — > s k where 1 < k < £, s° = s and s = s' . Then the resistance of £ 
is at least J2k=i m ^i Le - x((s°,s°) > Efe=i m ^- 
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Let us consider the following path on T min : 

£:= (a 1 ,* 1 ) A 

From Claim 1, we know that the total resistance of the path £ is at most Ylk=i m «fc + 

A new tree X" can be obtained by adding the edges of C into T m ; n and removing 
the redundant edges. The removed resistance is strictly greater than y\ —1 + 
2(£ — l)m* where y\ —1 m, t is the lower bound on the resistance on the edge from 
(s°, s°) to (s^, s £ ), and 2(1 — l)m* is the strictly lower bound on the total resistances 
of leaving (s k , s k ) for k = 1, • • • ,£ — 1. The adding resistance is the total resistance 
of £ which is at most X)fe=i mi k + Since £ > 2, we have that 2(1 — l)m* > ^m* 
and thus T" has a strictly lower resistance than T m ; n . This contradicts the fact that 
T m ; n is a minimum resistance tree.D 

Claim 10. Given any edge ((s, s), (s' , s')) in T m i n , denote by i the unilateral 
deviator between s and s' . Then the transition Si — > s ■ is feasible. 

Proof. Assume that the transition — > s^ is infeasible. Suppose the path L has 
the minimum resistance among all the paths from (s,s) to (s',s'). Then, there are 
£ > 2 experiments in L. The remainder of the proof is similar to that of Claim 9.D 

Claim 11. Let h v be the root o/T m ; n . Then, h v s diagS**. 

Proof. Assume that h v = (s°,s°) ^ diag5*. Pick any (s*,s*) e diag5*. By 
Claim 9 and 10, we have that there is a path from (s*,s*) to (s°, s°) in the tree T m j n 
as follows: 

t : = s <) ( s ^-i, s ^-i) =►•••=}► (a 1 , s 1 ) =► (s\ s°) 

for some £ > 1. Here, s* = s^, there is only one deviator, say ik, from s k to s k ~ 1 , and 
the transition s fe — > s k ~ 1 is feasible for k = £,..., 1. 

Since the transition s fc — > s fc+1 is also feasible for = 0, . . . ,£ — 1, we obtain the 
reverse path £' of C as follows: 

£' := (s°, s°) => (s\ s 1 ) =►•••=► (s^ 1 , s^ 1 ) 

By Claim 8, the total resistance of the path C is 

i i 

X (C) = £m 4 +^{* 4fc ( s fe , s fe - 1 ) - K^" 1 ) - A ifc (**-W))}, 
fe=l fe=l 

and the total resistance of the path C is 

*;=i fe=i 

Denote Ai := (V(a k k , f* k )\D(a^-\ , r^_\))nQ and A' 2 := (©(a^.r^A^.r^JJn 
Q. Observe that 

= Uu ( S fe ) - u^,*" 1 ) - E - ~^7~TT^) + E W q (^r ~ -^PrJ 

^ q n q (s k - 1 ) n q (s k ) ' ^ n q (s k ) n^- 1 )' 

= K( s fc ) - A ife (s fe , s fe_1 )) - K^" 1 ) - Aj fe (s fe_1 ,s fe )). 
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We now construct a new tree T" with the root (s*,s*) by adding the edges of 
£' to the tree T min and removing the redundant edges C. Since 1 t , ik (s k ~ 1 , s k ) = 
&i h (s k , s k ~ 1 ), the difference in the total resistances across the trees x(T") and x(^min) 
is given by 

X(T') - x (T min ) = *(£') - X (£) 

l i 

fe=i fc=i 

fe=i 

This contradicts that T m i n is a minimum resistance tree.D 

It follows from Claim 4 that the state h v 6 diag S 1 * has minimum stochastic 
potential. Then Proposition 14.51 is a direct result of Theorem 16.61 □ 

We are now ready to show Theorem 13.21 

Proof of Theorem I3.lt 

Claim 12. Condition (B2) in Theorem \6. 51 holds. 
Proof. The proof is analogous to Claim 5.D 
Claim 13. Condition (B3) in Theorem \6. 51 holds. 

Proof. Denote by P £ (*) the transition matrix of {Vt}- Consider the feasible 
transition z 1 —¥ z 2 with unilateral deviator i. The corresponding probability is given 

by 



pf\ = 



D 2 _ „1 



where 



e(i) m < l-e(i) mi (1 - e(<)" 1 ') x e(i)^( s °' sl ) 



Vl ; ATlJr^ai) \{ s o, s i}|' ^ : JVCl + e^w^ .* 1 ))' m ' iV(l + efijM-V 1 )) 

The remainder is analogous to Claim 6.D 

Claim 14. Condition (Bl) in Theorem \6.5\ holds. 

Proof. Observe that \Ti(aj)\ < 5\C\. Since e(t) is strictly decreasing, there is 
to > 1 such that to is the first time when 1 — e(t) mi > e(t) mi . 
Observe that for all t > 1, it holds that 

e(i) m ' e(t) m > +m " 
Vi > , f/r i.| tt > 



N(5\C\-l) " JV(5|C|-1)' 

Denote b := u i (s 1 )-A l (s 1 , s°) and a := Ui(s°)-Ai(s° , s 1 ). Then p,(s°, s 1 ) = b-a. 
Since 6 — a < m* , then for t > to it holds that 

1 - e(t) m - (1 - e(t) mi )e(t) nxax ^- b 

V2~ 



N(l + e(t) b - a ) N(e(t) max i a ' b }-b + e (t)max{a,fc}-a) 



2iV -JV(5|C|-1) 
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Similarly, for t >t , it holds that 



(1 - e(t) m Oc(t) maat {°' 6 >- a e(t) m ' +m * 

/lo = > 

' 7\7"/,/4-^m^^v4/^ Til- — h 1 fi \ — n\ — 



jV(e(t) max -t a < fc }- b + e (i)max{a,6}-a) - AT(5|C| — 1) " 

Since G (2m* , Km*} for all i G V and i^m* > 1, then for any feasible transition 
z 1 — > z 2 with z 1 ^ z 2 , it holds that: 

zlz2 " A(5|C| - 1) 

for all t > to- Furthermore, for all t>to and all z 1 G diag*4, we have that: 



N , N , JV 



e(f) 



(if+l)m* 



AT^ -N(5\C\-lY 

Choose fcj := (I? + l)i and let iq be the smallest integer such that (D + l)io > to- 
Similar to (|4.3[) . we can derive the following property 

+00 +00 

E^ - A ( P ^ fc ^+i))) ^ (j V( 5 |C| - 1)) ( P+ i)(* + i )m - E (D + l)i = + °°- 

Hence, the weak ergodicity of {Vt} follows from Theorem I6.4I D 

All the conditions in Theorem 16 . 51 hold . Thus it follows from Theorem l6.5l that the 
limiting distribution is /1* = lim t ^ +oc /A Note that lim f ^ +oc /x* = lim^ +oc fi(e(t)) = 
/i(0) and Proposition 14.51 shows that the support of /j,(0) is contained in the set 
diag S* . Hence, the support of fi* is contained in the set diag S* , implying that 
linii^+oo F(z(t) G diag 5*) = 1. It completes the proof. 

5. Conclusions. We have formulated a coverage optimization problem as a con- 
strained potential game. We have proposed two payoff-based distributed learning 
algorithms for this coverage game and shown that these algorithms converge in prob- 
ability to the set of constrained NEs and the set of global optima of certain coverage 
performance metric, respectively. 

6. Appendix. For the sake of a self-contained exposition, we include here some 
background in Markov chains [2] and the Theory of Resistance Trees [3T] . 

6.1. Background in Markov chains. A discrete-time Markov chain is a discrete- 
time stochastic process on a finite (or countable) state space and satisfies the Markov 
property (i.e., the future state depends on its present state, but not the past states). 
A discrete-time Markov chain is said to be time-homogeneous if the probability of 
going from one state to another is independent of the time when the step is taken. 
Otherwise, the Markov chain is said to be time-inhomogeneous. 

Since time-inhomogeneous Markov chains include time-homogeneous ones as spe- 
cial cases, we will restrict our attention to the former in the remainder of this section. 
The evolution of a time-inhomogeneous Markov chain {Vt} can described by the tran- 
sition matrix P(t) which gives the probability of traversing from one state to another 
at each time t. 

Consider a Markov chain {Vt} with time-dependent transition matrix P(t) on a 
finite state space X. Denote by P(m, n) :— n"= m < m < n. 

Definition 6.1 (Strong ergodicity [33]). ^ e Markov chain {Vt} is strongly 
ergodic if there exists a stochastic vector /i* such that for any distribution fi on X and 
any m G Z+, it holds that limfc_j. +00 /i T P(m, k) = (/i*) T . 
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Strong ergodicity of {Vt} is equivalent to {Vt} being convergent in distribution 
and will be employed to characterize the long-run properties of our learning algorithm. 
The investigation of conditions under which strong ergodicity holds is aided by the 
introduction of the coefficient of ergodicity and weak ergodicity defined next. 

Definition 6.2 (Coefficient of ergodicity [2]). For any n x n stochastic matrix 
P, its coefficient of ergodicity is defined as A(P) := 1 — min.i<,- ) j< n Ylu— i ~HH*i(Pik,Pjk)- 

Definition 6.3 (Weak ergodicity [2]). The Markov chain {Vt} is weakly ergodic 
if\fx,y,z £ X, Vra £ Z + . it holds that Yva\k^.+ 00 (P xz {m, k) — P yz (m,k)) = 0. 

Weak ergodicity merely implies that {Vt} asymptotically forgets its initial state, 
but does not guarantee convergence. For a time-homogeneous Markov chain, there is 
no distinction between weak ergodicity and strong ergodicity. The following theorem 
provides the sufficient and necessary condition for {Vt} to be weakly ergodic. 

Theorem 6.4 f[14j). The Markov chain {Vt} is weakly ergodic if and only 
if there is a strictly increasing sequence of positive numbers k i; i £ Z + such that 
Et"(l-A(P(fc i) A; i+1 )) = +oo. 

We are now ready to present the sufficient conditions for strong ergodicity of the 
Markov chain {Vt}- 

Theorem 6.5 (|14j). A Markov chain {Vt} is strongly ergodic if the following 
conditions hold: 

(Bl) The Markov chain {Vt} is weakly ergodic. 

(B2) For each t, there exists a stochastic vector fi on X such that [i l is the left 
eigenvector of the transition matrix P(t) with eigenvalue 1. 

(B3) The eigenvectors /x* in (B2) satisfy J2t^o^2zex I/-4 — a4 +1 | < +oo. 
Moreover, if fj,* = lim t ^ +oc fi , then fi* is the vector in Definition \6.1\ 

6.2. Background in the Theory of Resistance Trees. Let P° be the tran- 
sition matrix of the time-homogeneous Markov chain {V®} on a finite state space X. 
And let P e be the transition matrix of a perturbed Markov chain, say {Vf}. With 
probability 1 — e, the process {V%} evolves according to P°, while with probability e, 
the transitions do not follow P°. 

A family of stochastic processes {Vf} is called a regular perturbation of {V® } if the 
following holds \/x, y £ X: (Al) For some ? > 0, the Markov chain {Vf} is irreducible 
and aperiodic for all e £ (0, 

(A2) linWo+ 1% = P£y 

(A3) If Pl y > for some e, then there exists a real number x( x ~> 2/) > such 
that lim e ^ 0+ P a yex( x ^«) £ (0,+oo). 

In (A3), x( x ~^ y) i s called the resistance of the transition from x to y. 

Let H\ , H2 , • • • , Hj be the recurrent communication classes of the Markov chain 
{Vt}- Note that within each class Hg, there is a path of zero resistance from every 
state to every other. Given any two distinct recurrence classes Hi and H^, consider 
all paths which start from Hi and end at H^. Denote xtk by the least resistance 
among all such paths. 

Now define a complete directed graph Q where there is one vertex £ for each 
recurrent class Hi, and the resistance on the edge (£,k) is xtk- An l-tree on Q is a 
spanning tree such that from every vertex k 7^ I, there is a unique path from k to £. 
Denote by Git) the set of all £- trees on Q. The resistance of an £-tree is the sum of 
the resistances of its edges. The stochastic potential of the recurrent class Hi is the 
least resistance among all £-trees in G(£). 

Theorem 6.6 ( [31j ) . Let {VI} be a regular perturbation of {V^} , and for each e > 
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0, let /i(e) be the unique stationary distribution of {V^}. Then lim £ _ j . + /x(e) exists and 
the limiting distribution /i(0) is a stationary distribution of {V®}. The stochastically 
stable states (i.e., the support of fi(0)) are precisely those states contained in the 
recurrence classes with minimum stochastic potential. 
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